Search results for "Mel-frequency cepstrum"
showing 10 items of 13 documents
Image-Evoked Affect and its Impact on Eeg-Based Biometrics
2019
Electroencephalography (EEG) signals provide a representation of the brain’s activity patterns and have been recently exploited for user identification and authentication due to their uniqueness and their robustness to interception and artificial replication. Nevertheless, such signals are commonly affected by the individual’s emotional state. In this work, we examine the use of images as stimulus for acquiring EEG signals and study whether the use of images that evoke similar emotional responses leads to higher identification accuracy compared to images that evoke different emotional responses. Results show that identification accuracy increases when the system is trained with EEG recordin…
Speech Emotion Recognition method using time-stretching in the Preprocessing Phase and Artificial Neural Network Classifiers
2020
Human emotions are playing a significant role in the understanding of human behaviour. There are multiple ways of recognizing human emotions, and one of them is through human speech. This paper aims to present an approach for designing a Speech Emotion Recognition (SER) system for an industrial training station. While assembling a product, the end user emotions can be monitored and used as a parameter for adapting the training station. The proposed method is using a phase vocoder for time-stretching and an Artificial Neural Network (ANN) for classification of five typical different emotions. As input for the ANN classifier, features like Mel Frequency Cepstral Coefficients (MFCCs), short-te…
ASR performance prediction on unseen broadcast programs using convolutional neural networks
2018
In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …
Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network
2020
In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use atten…
Low-Power Audio Keyword Spotting using Tsetlin Machines
2021
The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipe…
A case study on feature sensitivity for audio event classification using support vector machines
2016
Automatic recognition of multiple acoustic events is an interesting problem in machine listening that generalizes the classical speech/non-speech or speech/music classification problem. Typical audio streams contain a diversity of sound events that carry important and useful information on the acoustic environment and context. Classification is usually performed by means of hidden Markov models (HMMs) or support vector machines (SVMs) considering traditional sets of features based on Mel-frequency cepstral coefficients (MFCCs) and their temporal derivatives, as well as the energy from auditory-inspired filterbanks. However, while these features are routinely used by many systems, it is not …
Single-channel EEG-based subject identification using visual stimuli
2021
Electroencephalography (EEG) signals have been recently proposed as a biometrics modality due to some inherent advantages over traditional biometric approaches. In this work, we studied the performance of individual EEG channels for the task of subject identification in the context of EEG-based biometrics using a recently proposed benchmark dataset that contains EEG recordings acquired under various visual and non-visual stimuli using a low-cost consumer-grade EEG device. Results showed that specific EEG electrodes provide consistently higher identification accuracy regardless of the feature and stimuli types used, while features based on the Mel Frequency Cepstral Coefficients (MFCC) provi…
MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech
2022
Abstract Clinical depression or Major Depressive Disorder (MDD) is a common and serious medical illness. In this paper, a deep Recurrent Neural Network-based framework is presented to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, expanding training labels and transferred features are considered. The proposed approach outperforms the state-of-art approaches on the DAIC-WOZ database with an overall accura…
On the Robustness of Deep Features for Audio Event Classification in Adverse Environments
2018
Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this p…
Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks
2006
Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as de…